RTC CDN Design
This article introduce how to build a RTC (real-time communication) CDN system which supporting 10 million users online at the same time. RTC via voice is quit popular and common in our daily life, such as talking with team members in the game, or making a call by facebook\dingtalk\wechat.
Background
I’m working on RTC development at a well-known campany, our product is a voice SDK kit which help applications to implement voice related functions, such as VOIP communication, speech to text, text to speech, voice message etc. Our product are 10 years old, has about billion users so far, and they are distributed all over the world.
I am a client-side developer, but in the early stage of our product, back-end developers is insufficient, so I participated in the design and development of the CDN service, and was responsible for the development of the scheduling service and the DB service.
How to implement voice chat function? Here we donot discus one-on-one call scenario, let’s focus on multi-voice scenario. Similar to a meeting, everyone enter the same meeting room, CDN is the container of the room, one CDN server contains a lot of rooms, user could enter the room by room name.
In addition to rooms, we also need to help users manage voice devices and process voice data, so we wrap a SDK to make it easy to implement voice chat. Our users are distributed all over the world, so our CDN servers should be deployed all over the world.
Our SDK accesses the CDN service in two steps:
- Get addresses of CDN server and room informations from Scheduler server;
- Shake hand with CDN server and transmit voice data.
RTC CDN has mainly two parts: Scheduler Services and CDN Services, Scheduler Services tell SDK which cdn server the room is in, CDN service is to transmit voice data betweent different members in the room. The topology is as blow:
Design
I will first introduce how to implement functionality, and then introduce non-functional factors that must be considered, for example, scalability and availability.
Access
When designing, we have to consider many factors, one the one hand, we donot need to start from scratch, our company has ready-made such as gateway, protocol&socket-wraper, control center etc, on the other hand, our product has several functions(VOIP/Message/STT/…), every function should be independent service, but they required unified access, so we build a scheduler service which help route different kinds of requests to corresponding service.
Gateway
Gateway service is the gatekeeper for users to use our services, help our services avoid direct attack. Of course, it can take more responsibilities.
Firstly, gateway can play the role of load balancing, one gateway server can redirect requests to differents RS(real servers), also different gateway servers can redirect requests to one RS, we usurally return several gateway access addresses to SDK, SDK can try one by one, to find an usable one.
Secondly, gateway can help users access RS avoiding cross internet operators. There is a fee settlement relationship between different network operators, cost budgets across operators are not sufficient, so cross-operation bandwidth is also insufficient. When the number of users increases during the peak period(20:30 ~ 23:00), cross-operation access may cause network freezes, poor connectivity and high packet loss. Therefore, we should avoid cross-operation access.
Thirdly, gateway is transparent to users and RS servers, the principle can refer to the IPIP protocol, our CDN services is based on four-layer protocol, four-layer agent mainly works in the transport layer in the OSI model, and the transport layer mainly deals with the transmission of the message(Gateway:ip+port <–> RS:ip+port), regardless of the content of the message.
Scheduler service
Scheduler service can be thought of as a proxy or router, it helps route our product’s requests to correspondding services. Of course, it also has other responsibilities, as below:
- Authorization, every request should be legal, scheduler service verifies the signature of the digest, if not matched, scheduler service will reply a response of error.
- Rate limit, although the gateway has already done this loosely, we still need to limit the rate of the requests for specific business.
- Billing, yes, our services is not free, we need to count user requests and charge the businesses.
The DB(database) server will be accessed when the scheduling service is authenticated, businees should registe account on the website, the business account will be writen into the DB server. Speaking of DB, we should pay attention to performance and availability, the answer is Caching and Master-Slave DB.
Scheduler get business account from cache, if cannot find account information in the cache, then query from DB server and write into cache. If Master DB server donot work, then we try Slave DB server, usually we have two slave DB servers.
This article is focus on RTC CDN design, so we just pay attention to the ability of redirecting requests to VOIP dispenser service.
Routing
When join-room requests are routed to dispenser service, dispenser service’s responsibility is to route these requests to corresponding CDN clusters witch contain the rooms.
Request routing has generally tow ways, Round Robin and Hashing, many routing metheds are similar variants, such as WLC(Weighted Least Connection) and WRR(Weighted Round Robin).
Round Robin
Round Robin routing algorithm is to route requests to different servers in turn in a round-robin manner, that is, each schedule executes i = (i + 1) mod n, and selects the i-th server. The advantage of the algorithm is its simplicity, it does not need to record the state of all current connections, so it is a stateless scheduling, the gateway and scheduling services mentioned above use this algorithm and they are all stateless services.
Simple Hashing
The Simple Hashing routing algorithm is to route requests to different servers by hashing the request’s key(ex. room name), that is, each schedule executes i = hash_string_to_int(string) mod n, and selects the i-th server.
The Simple Hashing seems suitable to hash room name to corresponding CDN cluster, in fact, it does, it works fine most of the time, except when adding or removing CDN cluster, as below:
Expanding and reducing will cause a change in the number of CDN clusters, which lead to the requestes with the same room name are routing to different CDN clusters. In other words, there are two room instances with the same name in different clusters, users cannot communicate between two rooms in different CDN clusters.
Consistent Hashing
Consistent hashing solves this horizontal scalability problem by ensuring that every time we scale up or down, we do not have to re-arrange all the keys or touch all the servers.
Consistent Hashing is a distributed hashing scheme that operates independently of the number of nodes in a distributed hash table by assigning them a position on an abstract circle, or hash ring.
We hash the requests and distribute them on the ring depending on what the output was. Similarly, we also hash the node and distribute them on the same ring as well.
Hash(key1)=P1 Hash(key2)=P2 Hash(key3)=P3 … Hash(keyN)=PN
Where,
key: Request/Node ID or IP.
P: Position on the hash ring.
N: Total range of the hash ring.
Using consistent hashing, only K/N data would require re-distributing.
R = K/N
Where,
R: Data that would require re-distribution.
K: Number of partition keys.
N: Number of nodes.
This allows servers and objects to scale without affecting the overall system.
Now, when the request comes in we can simply route it to the closest node in a clockwise (can be counterclockwise as well) manner. This means that if a new node is added or removed, we can use the nearest node and only a fraction of the requests need to be re-routed.
In theory, consistent hashing should distribute the load evenly however it doesn’t happen in practice. Usually, the load distribution is uneven and one server may end up handling the majority of the request becoming a hotspot, essentially a bottleneck for the system. We can fix this by adding extra nodes but that can be expensive.
In order to ensure a more evenly distributed load, we can introduce the idea of a virtual node, sometimes also referred to as a VNode.
Instead of assigning a single position to a node, the hash range is divided into multiple smaller ranges, and each physical node is assigned several of these smaller ranges. Each of these subranges is considered a VNode. Hence, virtual nodes are basically existing physical nodes mapped multiple times across the hash ring to minimize changes to a node’s assigned range.
As VNodes help spread the load more evenly across the physical nodes on the cluster by diving the hash ranges into smaller subranges, this speeds up the re-balancing process after adding or removing nodes. This also helps us reduce the probability of hotspots.
CDN Services Routing
After hash scheduling through Dispenser service, the request is distributed to the CDN Manager service, the CDN manager service returns the corresponding room informations and CDN server addresses, and then the SDK initiates the CDN handshake process, and the voice data can be transmitted after the handshake is successful.
The Gateway service, Sheduler service and CDN Manager service are all Round-Robin type scheduling, only Dispenser service is Hashing type scheduling.
CDN Manager service is statefull, it has the addresses and room informations of all CDN servers in the current cluster. Room Creating is according to the Round-Bobin method. If the room is already exist, join room request will be routed according to the Hashing method.
CDN Cluster contains a certain number of CDN servers and CDN Manager servers, businesses can be equipped according to their own business characteristics, according to experience, a cluster of 50 CDN servers and 3 CDN managers is the most suitable, each server can carry 1,000 PCUs, and only 20 clusters are required for 10 million PCU services.
Voice data routing is simple, users enter the same room, every member’s voice data is copied and forwarded to other members. If there are millions of users in the same room, and one server cannot carry so many users, what should we do? We have two options:
- Synchronizing the addresses of members of the same room through RTC Room Manager service.
- Forwarding voice data to the same room through RTC Room Manager service.
Considering that there are too many members and few of them speaking at the same time, it is more effective to synchronize voice data than to synchronize member addresses.
Availability
Almost all of the above services are stateless. Even a stateful CDN manager can be regarded as stateless. When the CDN manager was failed, the CDN in the cluster will report the information to the CDN manager after restarting.
Stateless service failures are easy to resolve, each server is independent and equivalent, and requests can be routed to any of them. The upstream service will detect whether the downstream service is alive. When a server goes down, we only need to remove it from the cluster.
In addition, for the stateful CDN manager service, we will add multiple backup servers, which are distributed in equipment rooms in different regions, and the failure of one of them will not affect the normal operation of the entire CDN cluster.
Stateless services plus a retry mechanism can ensure the maximum availability of services, client will send a retry request per second, Round Robin routing algorithm will dispatch requests one by one, we can effectively avoid faulty servers, if the join room timeout is 20 seconds, we will have 20 chances to clear the level and complete the mission.
Scalability
It is very simple to expand the capacity of stateless services. Deploy new servers, perform connectivity test, modify the upstream servers’ configuration, add the new downstream servers to the list, and check the traffic of the new server afters the expansion.
Reducing is similar, modify the upstream server configuration, remove the servers from the list, wait for the servers’ traffic to drop to 0, shut down and reclaim the reduced servers.
When expanding or Reducing, join-room requests for the same room may be distributed to different clusters, causing the same room to exist in different clusters. Since different CDN clusters are not directly interoperable, both expanding and Reducing are lossy.
If you can’t accept that the service is lossy when scaling up or down, you need to use a non-relational database, such as redis, to store historical room routes. This modification will lead to an increase in the complexity of some service logic.
Summary
Designing a service that can support tens of millions of PCUs is challenging, and there are many things to note.
First of all, it needs to be divided into modules. You can’t think of only one program, one architecture, or just relying on a simple third-party module to realize the entire services. The large system must be divided into small modules, and each subsystem must have a clear division of labor and clear functions.
The second is hierarchical. After the modules are divided, what each module should do, how to do it, and what modules each employee is good at should be clearly subdivided, so that the system can be built more easily and efficiently.
The third is clustering. Don’t put all the eggs in one basket. It is necessary to emphasize offsite deployment, cluster isolation, and flexible scheduling between clusters. Converged communications are not necessary to ensure that a single cluster failure does not spread to all clusters.
